The Random Nature of Genome Architecture: Predicting Open Reading Frame Distributions
نویسندگان
چکیده
BACKGROUND A better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes. METHODOLOGY/PRINCIPAL FINDINGS By fitting mixture models to data from whole genome sequences we show that the size-frequency distributions for ORFS are strikingly similar across prokaryotic and eukaryotic genomes. Moreover, we show that i) a large fraction (60-80%) of ORF size-frequency distributions can be predicted a priori with a stochastic assembly model based on GC content, and that (ii) size-frequency distributions of the remaining "non-random" ORFs are well-fitted by log-normal or gamma distributions, and similar to the size distributions of annotated proteins. CONCLUSIONS/SIGNIFICANCE Our findings suggest stochastic processes have played a primary role in the evolution of genome complexity, and that common processes govern the conservation and loss of functional genomics units in both prokaryotes and eukaryotes.
منابع مشابه
Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences.
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5' untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library o...
متن کاملThe nucleotide sequence of Saccharomyces cerevisiae chromosome IX.
Large-scale systematic sequencing has generally depended on the availability of an ordered library of large-insert bacterial or viral genomic clones for the organism under study. The generation of these large insert libraries, and the location of each clone on a genome map, is a laborious and time-consuming process. In an effort to overcome these problems, several groups have successfully demon...
متن کاملTying Down Loose Ends in the Chlamydomonas Genome: Functional Significance of Abundant Upstream Open Reading Frames.
The Chlamydomonas genome has been sequenced, assembled, and annotated to produce a rich resource for genetics and molecular biology in this well-studied model organism. The annotated genome is very rich in open reading frames upstream of the annotated coding sequence ('uORFs'): almost three quarters of the assigned transcripts have at least one uORF, and frequently more than one. This is proble...
متن کاملNature, Politics and Architecture; Reading Out the Interaction of Nature, Politics and Culture Components in the Architecture Creating Process of Tabriz Blue Mosque
Tabriz Blue Mosque is a valuable historical monument from the 9th century AH, which has been built during the Kara - Koyunlu of Turkomans rule on northwestern Iran and about 35 years before the beginning of the Safavid Iranian government. This building has some features that make it to be distinguished from other monuments of the Azerbaijan region and even Iran. These features have attracted th...
متن کاملDistinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes.
A substantial fraction of hypothetical open reading frames (ORFs) in completely sequenced bacterial genomes are short, suggesting that many are not genes but random stretches of DNA. Although it is not feasible to authenticate the coding capacity of all such regions experimentally, comparisons of ORFs in related genomes can expose those that encode functional proteins.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 4 شماره
صفحات -
تاریخ انتشار 2009